AITopics | feature collapse

Normative and task-driven theories offer powerful top-down explanations for biological systems, yet the goals of quantitatively arbitrating between competing theories, and utilizing them as inductive biases to improve data-driven fits of real biological datasets are prohibitively laborious, and often impossible. To this end, we introduce a Bayesian meta-learning framework designed to automatically convert raw functional predictions from normative theories into tractable probabilistic models. We employ adaptive deep kernel Gaussian processes, meta-learning a kernel on synthetic data generated from a normative theory. This Theory-Informed Kernel specifies a probabilistic model representing the theory predictions -- usable for both fitting data and rigorously validating the theory. As a demonstration, we apply our framework to the early visual system, using efficient coding as our normative theory. We show improved response prediction accuracy in ex vivo recordings of mouse retinal ganglion cells stimulated by natural scenes compared to conventional data-driven baselines, while providing well-calibrated uncertainty estimates and interpretable representations. Using exact Bayesian model selection, we also show that our informed kernel can accurately infer the degree of theory-match from data, confirming faithful encapsulation of theory structure. This work provides a more general, scalable, and automated approach for integrating theoretical knowledge into data-driven scientific inquiry in neuroscience and beyond.

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2509.24919

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Austria (0.04)
Asia > Middle East > Jordan (0.04)
(2 more...)

Genre: Research Report > Experimental Study (0.46)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Add feedback

818f4654ed39a1c147d1e51a00ffb4cb-Paper.pdf

Neural Information Processing SystemsAug-15-2025, 12:22:54 GMT

augmented shortcut, shortcut, transformer, (15 more...)

Neural Information Processing Systems

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Asia > China (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

4b3cc0d1c897ebcf71aca92a4a26ac83-Supplemental-Conference.pdf

Neural Information Processing SystemsAug-14-2025, 16:31:58 GMT

ddiag, exp, normalized, (15 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.47)

Add feedback

On the Effectiveness of Random Weights in Graph Neural Networks

Bui, Thu, Schönlieb, Carola-Bibiane, Ribeiro, Bruno, Bevilacqua, Beatrice, Eliasof, Moshe

arXiv.org Machine LearningJan-31-2025

Graph Neural Networks (GNNs) have achieved remarkable success across diverse tasks on graph-structured data, primarily through the use of learned weights in message passing layers. In this paper, we demonstrate that random weights can be surprisingly effective, achieving performance comparable to end-to-end training counterparts, across various tasks and datasets. Specifically, we show that by replacing learnable weights with random weights, GNNs can retain strong predictive power, while significantly reducing training time by up to 6$\times$ and memory usage by up to 3$\times$. Moreover, the random weights combined with our construction yield random graph propagation operators, which we show to reduce the problem of feature rank collapse in GNNs. These understandings and empirical results highlight random weights as a lightweight and efficient alternative, offering a compelling perspective on the design and training of GNN architectures.

artificial intelligence, machine learning, rap-gnn, (17 more...)

arXiv.org Machine Learning

2502.0019

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)
Asia > China > Tianjin Province > Tianjin (0.04)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Probabilistic Skip Connections for Deterministic Uncertainty Quantification in Deep Neural Networks

Jimenez, Felix, Katzfuss, Matthias

arXiv.org Machine LearningJan-8-2025

Deterministic uncertainty quantification (UQ) in deep learning aims to estimate uncertainty with a single pass through a network by leveraging outputs from the network's feature extractor. Existing methods require that the feature extractor be both sensitive and smooth, ensuring meaningful input changes produce meaningful changes in feature vectors. Smoothness enables generalization, while sensitivity prevents feature collapse, where distinct inputs are mapped to identical feature vectors. To meet these requirements, current deterministic methods often retrain networks with spectral normalization. Instead of modifying training, we propose using measures of neural collapse to identify an existing intermediate layer that is both sensitive and smooth. We then fit a probabilistic model to the feature vector of this intermediate layer, which we call a probabilistic skip connection (PSC). Through empirical analysis, we explore the impact of spectral normalization on neural collapse and demonstrate that PSCs can effectively disentangle aleatoric and epistemic uncertainty. Additionally, we show that PSCs achieve uncertainty quantification and out-of-distribution (OOD) detection performance that matches or exceeds existing single-pass methods requiring training modifications. By retrofitting existing models, PSCs enable high-quality UQ and OOD capabilities without retraining.

feature collapse, feature vector, intermediate layer, (14 more...)

arXiv.org Machine Learning

2501.04816

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
Africa > Senegal > Kolda Region > Kolda (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.85)

Add feedback

EXAONEPath 1.0 Patch-level Foundation Model for Pathology

Yun, Juseung, Hu, Yi, Kim, Jinhyung, Jang, Jongseong, Lee, Soonyoung

arXiv.org Artificial IntelligenceAug-22-2024

Recent advancements in digital pathology have led to the development of numerous foundational models that utilize self-supervised learning on patches extracted from gigapixel whole slide images (WSIs). While this approach leverages vast amounts of unlabeled data, we have discovered a significant issue: features extracted from these self-supervised models tend to cluster by individual WSIs, a phenomenon we term WSI-specific feature collapse. This problem can potentially limit the model's generalization ability and performance on various downstream tasks. To address this issue, we introduce EXAONEPath, a novel foundational model trained on patches that have undergone stain normalization. Stain normalization helps reduce color variability arising from different laboratories and scanners, enabling the model to learn more consistent features. EXAONEPath is trained using 285,153,903 patches extracted from a total of 34,795 WSIs. Our experiments demonstrate that EXAONEPath significantly mitigates the feature collapse problem, indicating that the model has learned more generalized features rather than overfitting to individual WSI characteristics. We compared EXAONEPath with state-of-the-art models across six downstream task datasets, and our results show that EXAONEPath achieves superior performance relative to the number of WSIs used and the model's parameter count. This suggests that the application of stain normalization has substantially improved the model's efficiency and generalization capabilities.

foundation model, licensee, wsis, (17 more...)

arXiv.org Artificial Intelligence

2408.0038

Country:

Asia > South Korea > Seoul > Seoul (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > China > Guangxi Province > Nanning (0.04)

Genre: Research Report > New Finding (0.68)

Industry:

Law (1.00)
Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Discriminant Distance-Aware Representation on Deterministic Uncertainty Quantification Methods

Zhang, Jiaxin, Das, Kamalika, Kumar, Sricharan

arXiv.org Artificial IntelligenceFeb-19-2024

Uncertainty estimation is a crucial aspect of deploying dependable deep learning models in safety-critical systems. In this study, we introduce a novel and efficient method for deterministic uncertainty estimation called Discriminant Distance-Awareness Representation (DDAR). Our approach involves constructing a DNN model that incorporates a set of prototypes in its latent representations, enabling us to analyze valuable feature information from the input data. By leveraging a distinction maximization layer over optimal trainable prototypes, DDAR can learn a discriminant distance-awareness representation. We demonstrate that DDAR overcomes feature collapse by relaxing the Lipschitz constraint that hinders the practicality of deterministic uncertainty methods (DUMs) architectures. Our experiments show that DDAR is a flexible and architecture-agnostic method that can be easily integrated as a pluggable layer with distance-sensitive metrics, outperforming state-of-the-art uncertainty estimation methods on multiple benchmark problems.

latent representation, prototype, representation, (12 more...)

arXiv.org Artificial Intelligence

2402.12664

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Spain (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)
(2 more...)

Add feedback

Normalization Is All You Need: Understanding Layer-Normalized Federated Learning under Extreme Label Shift

Zhang, Guojun, Beitollahi, Mahdi, Bie, Alex, Chen, Xi

arXiv.org Artificial IntelligenceAug-18-2023

Layer normalization (LN) is a widely adopted deep learning technique especially in the era of foundation models. Recently, LN has been shown to be surprisingly effective in federated learning (FL) with non-i.i.d. data. However, exactly why and how it works remains mysterious. In this work, we reveal the profound connection between layer normalization and the label shift problem in federated learning. To understand layer normalization better in FL, we identify the key contributing mechanism of normalization methods in FL, called feature normalization (FN), which applies normalization to the latent feature representation before the classifier head. Although LN and FN do not improve expressive power, they control feature collapse and local overfitting to heavily skewed datasets, and thus accelerates global training. Empirically, we show that normalization leads to drastic improvements on standard benchmarks under extreme label shift. Moreover, we conduct extensive ablation studies to understand the critical factors of layer normalization in FL. Our results verify that FN is an essential ingredient inside LN to significantly improve the convergence of FL while remaining robust to learning rate choices, especially under extreme label shift where each client has access to few classes.

artificial intelligence, machine learning, normalization, (18 more...)

arXiv.org Artificial Intelligence

2308.09565

Country: Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Collaborating Authors

feature collapse

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

4b3cc0d1c897ebcf71aca92a4a26ac83-Supplemental-Conference.pdf

818f4654ed39a1c147d1e51a00ffb4cb-Paper.pdf

Meta-Learning Theory-Informed Inductive Biases using Deep Kernel Gaussian Processes

818f4654ed39a1c147d1e51a00ffb4cb-Paper.pdf

4b3cc0d1c897ebcf71aca92a4a26ac83-Supplemental-Conference.pdf

On the Effectiveness of Random Weights in Graph Neural Networks

Probabilistic Skip Connections for Deterministic Uncertainty Quantification in Deep Neural Networks

EXAONEPath 1.0 Patch-level Foundation Model for Pathology

Discriminant Distance-Aware Representation on Deterministic Uncertainty Quantification Methods

Normalization Is All You Need: Understanding Layer-Normalized Federated Learning under Extreme Label Shift